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Abstract — A random walk (or a Wiener process), possibly with 
drift, is observed in a noisy or delayed fashion. The problem 
considered in this paper is to estimate the first time r the random 
walk reaches a given level. Specifically, the p-moment (p > 1) 
optimization problem inf^E]^ r| p is investigated where the 
inli inn in is taken over the set of stopping times that are defined 
on the observation process. 

When there is no drift, optimal stopping rules are character- 
ized for both types of observations. When there is a drift, upper 
and lower bounds on inf,, E\r/ — r\ p are established for both types 
of observations. The bounds are tight in the large-level regime for 
noisy observations and in the large-level-large-delay regime for 
delayed observations. Noteworthy, for noisy observations there 
exists an asymptotically optimal stopping rule that is a function 
of a single observation. 

Simulation results are provided that corroborate the validity 
of the results for non-asymptotic settings. 

Index Terms — change-point detection problem, estimation, op- 
timal stopping theory, random walk, stopping time, tracking 
stopping time (TST), Wiener process 



I. Introduction 

Suppose X = {X t }t>o is a stochastic process and r a 
stopping time defined over X\}\ Statistician has access to 
X only through correlated observations Y = {Y t } t >o and 
wishes to find a stopping r\ defined over Y that gets as 
close as possible to t, for instance, so as to minimize some 
average absolute moment ¥,\i] — t\ p . This general formulation 
was introduced in f9] as the Tracking Stopping Time (TST) 
problem, and an early instance of it where Y = X and where 
r is a randomized stopping time was investigated in [8|. 

The TST problem generalizes the long studied Bayesian 
change-point detection problem (see, e.g., [13] and the books 
[10 1 and [Tj for surveys on theory and applications of the 
change-point problem). 

In the Bayesian change-point problem, there is a ran- 
dom variable 9, taking on values in the positive integers, 
and two probability distributions Pq, the "nominal" distribu- 
tions, and Pi, the "alternative" distribution. Under Pq, the 
conditional density function of Y t given Yq,Y\, . . . ,Y t -\ is 
fo(Y t \Y Q ,Y 2 , . . . ,Y t ^), for every t > 0. Under P lt the 
conditional density function of Y t given Yq,Yi, . . . , Y t -\ is 
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fi(Y t \Yo, Yi, . . . , Yt-i), for every t > 0. The observed process 
is distributed according Pg, which assigns the conditional 
density functions of Po for all t < 9, and the conditional 
density functions of P\ for all t > 9. The Bayesian change- 
point problem typically consists in finding a stopping time rj, 
with respect to {Yt}, that minimizes some (loss) function of 
the delay r\ — 9. 

To see that the Bayesian change-point problem can always 
be formulated as a TST problem, it suffices to define the 
process X = {X t }t>o as l f = for t < 9 and X t = 1 
for t > 9. The Bayesian change-point problem becomes the 
TST problem which consists in tracking 9 (now defined as a 
stopping time with respect to X) through Y, 

The difference between the Bayesian change-point problem 
and the TST problem lies in the equality 



k\r>n,y n ) 



k\r > n) k > n 



which always holds for the former but need not hold for the 
latter [9|. In other words, for TST problems past observations 
are in general useful for estimating the future value of r, by 
contrast with Bayesian change-point problems. For specific 
applications of the TST problem formulation related to moni- 
toring, communication, and forecasting we refer to ||9] Section 

I]. 

In J9], through a computer science approach, a general 
algorithmic solution is proposed for constructing optimal 
"trackers" for the cases where X and Y are processes defined 
over finite alphabets and r is bounded. What motivated an 
algorithmic approach is that the TST problem generalizes 
the Bayesian change-point problem for which general closed- 
form analytical solutions have been reported only for specific 
asymptotic regimes, typically the vanishing false-alarm regime 
(see, e.g., Q). Non-asymptotic closed-form solutions have 
been obtained essentially for i.i.d. cases where, conditioned 
on the change-point value, observations are independent with 
common distribution Po an d Pi before and after the change, 
respectively (see, e.g., ifTTl . llT2l )rl 

Two natural TST settings include the ones where the obser- 
vation process Y is a noisy or delayed version of X. In this 
paper we investigate both situations when X is a Gaussian 
random walk (or a Wiener process) possibly with drift, and r 
is the first time when X reaches some given level t. For noisy 
and delayed observations, we establish lower bounds on 



inf I 
n 



\n 



p> 1 



2 An exception is 1141 which considers Markov chain distributions, but of 
finite state. 



where the infimum is over all stopping times with respect 
to Y, then exhibit stopping rules that achieve these bounds 
in the large-threshold regime and large-delay-large-threshold 
regime, respectively. For noisy observations, two complemen- 
tary asymptotically optimal stopping rules are proposed. One 
depends on a single observation at some fixed time but its 
optimality is usually very asymptotic. The other performs a 
sequential minimum mean square error (mmse) estimate of 
X t given Y t , t = 0,1, . . . and stops as soon as this estimate 
reaches level £. As such, the second stopping time needs many 
more observations, roughly £/s, but performs significantly 
better in the non-asymptotic regime. 

In the particular case where X doesn't drift, we characterize 
inf,jE|?7 — t\ p non-asymptotically for both the noisy and the 
delayed observation cases. 

Section ||l] contains the main results and Section [HI] is 
devoted to the proofs. 

II. Results 
Consider the discrete-time process 



X : X = 



*t=£V< 



st 



t>l, 



i=l 



where s > is some known constant, where V\ , V% , ■ ■ ■ are 
i.i.d. ~ A/"(0, 1) (zero mean unit variance Gaussian random 
variables), and consider the first-passage time 

n = inf {t >0:X t >£} 

for some known fixed threshold level £ > 0. 

Given sequential observations of a process Y = {Y t }t>o 
correlated to X, we consider the optimization problem 



ME\r] -n\ p ,p> 1, 

V 



(1) 



where the infimum is over all stopping times r\ defined with 
respect to the natural filtration induced by Yjfl 

The results, presented in the next two subsections, relate to 
the situations where Y is either a noisy version of X, or a 
delayed version of X. 

Throughout the paper the following notational conventions 
are adopted. We use 77 to denote a function of Y = Y °°. 
When r\ has no argument, such as in ([T), we mean that r\ is a 
stopping time with respect to Y, Instead, if 77 has an argument, 
we mean that -q is a function of its argument which need not 
be a stopping time with respect to Y , For example, r](Y b ), 
with < a < b < 00, refers to a function of observations 
Y a = Y a , Ya+i, . . . , Yi,. 

Further, we frequently omit arguments of functions (or 
estimators) that appear in expressions to be optimized. For 
instance, instead of 



we simply write 



inf ■ E\ V (Y a b )~r e \v, 



inf E\n-Ti\ p 



3 We consider only non-randomized stopping times since this does not 
induce a loss of optimality with respect to l[T] (see, e.g., |4 Chap. 8.5] where 
randomization is shown to be useless for general statistical decision problems). 



to denote an optimization over estimators of ti that depend 
only on observations F a b . 

A. Noisy observations 

Consider the observation process 

t 
Y: Y o = Y t =X t +sY,W l t>\, 

i=i 

where W\,Wi,. . . are i.i.d. ~ A/"(0, 1) and where e > 
is some known constant. The observation noises {Wi} are 
supposed to be independent of {Vi}. 

Note that if £ — or if e — (i.e., X = Y), CQi is equal to 
zero by setting 77 = and r\ — ti, respectively. 

Interestingly, when £ > 0, e > 0, and s — 0, it turns out that 
it is impossible to track 17, even having access to the entire 
observation process Y^°: 

Theorem 1 (Noisy observations, s — 0, J2) Proposition 
2.1.ii.). For s = 0, e > 0, £ > 0, and p > 1/2, we havJft 

E\ v (Y ™)-Te\ p = OQ 

for any estimator rj (Yq°) of T(. 

We now consider the case t > 0, e > 0, and s > 0. 
The next result characterizes ([T} in the limit £ — > 00 and 
provides two asymptotically optimal stopping rules. One of 
these rules is non-sequential in the sense that it depends on a 
single observation. 

The sequential stopping rule is defined as 



rjf = inf {t >0:X t >£} 
where Xq = and where 

def 1 



def , 



x t = 



1 



jY t 



•Si 



1 



rt 



t> 1 



£* 



(2) 



(3) 



is the mmse estimator of X t given observation Y t . 

The non-sequential stopping rule is defined as follows. Lejj 



with 



vt = t* + 



def 



y-Xf). 



t* = [£/s - {£/s) q \ , 



(4) 



(5) 



for some arbitrary constant q 6 (1/2, 1). Notice that 77^ is only 
a function of observation Y t *. 

Theorem 2 (Noisy observations, s > 0). Fix < e < 00, 
< s < 00, and p > 1. Then, for r\ = r\f or t] = r\\ 

E|t7-t £ | p = (l + o(l)) inf E|?7'-t^ 



= (l+o(l))C 1 (£,s,e,p) 



(6) 



4 Recall that ^(Yq ) denotes an arbitrary function of observations Yq°° 
which need not be a stopping time, according to our notational convention of 
the previous section. 

5 X-|- denotes max{0, x} and [^'J denotes the integer part of x. 
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Fig. 1. E|r; — Tg \ P /C'i(l, s, e,p) as a function of £ for rj = rj® , rj = rjt, 
and rj = £/s (marks +, X, and •, respectively), with p = 1, s = 10, £ = .5, 
and q = .51. 



as £ — > oo, where 

Ci{£,s,e,p) = 
and where N — W(0, 1). 



(e 2 



s 3 (l+e 2 ) 



P/2 



EIM 



Since 



E|»j- 



^| p >infE|?/ -t# > inf Elfj-T^ 



the first equality in (O says that both stopping rules 77® and 77J 
do as well as the best non-causal estimators of ti with access 
to the entire observation process Y, asymptotically. Moreover, 
note that asymptotic optimality is universal over p > 1 for 
77® and universal over both p and e for 77^ — since the former 
does not depend on p and the latter depends neither on p nor 
on £. For p = 1, the optimality of 77® was established in J2] 
Theorem 2.3]. 

Since 7/^ does not exploit the dependency between X and 
Y (rjg does not depend on e), it may be expected that 77® 
performs significantly better that r\\ for moderate to low values 
of i. In fact, this claim is supported numerically. An illustration 
is given by Fig. Q] which represents numerical evaluations of 

EJ77_-7^ 

Ci(£,s,e,p) 

as a function of £ for 77 £ {rjf ,r]p,£/s}, with parameters 
p = I, s = 10, and e = .5. The parameter q in the definition of 
77^ is chosen to be equal to .51. The simulation has a precision 
of 6 = .1 for 77 = 77® and 77 = 77^, and a precision of 5 = .5 for 
S = £/ s. By precision we mean that the numerical evaluation 
of (0 deviates from it by less than 8 with probability at least 
1 — 6. Simulation details are provided in the appendix. 

We observe that, as £ — > 00, (0 tends to 1 for both 77 = 
77® and 77 = 77^, as predicted by Theorem |2] However, 77® 
performs significantly better than 77^ in the non-asymptotic 
regime. For instance, for £ ss 1000, E|?7® — Tg\ is roughly a 
third of E|?#-Ti|. 

More generally, simulation results suggest that E|?7® — Tg\ p 
never exceeds E|ttJ — Tg\ p , and this for arbitrary £ > 0, s > 0, 
e > 0, and p > 10 Moreover, the difference between E|?7® — 

6 Parameter q is kept equal to .51 in our study. 



Tg\ p and E|?7^ — ti\ p increases as £ decreases, and can be very 
significant for moderate to low values of £ . For instance, for 
£ = 1000, s = 10, e = .1, and q = .51, we have 

(R\Vt -n\)/M-T t \)n 12 (\) 

Thus, 77^ is suitable for very large values of £ since it has the 
interesting feature of being a function of a single observation. 
While also asymptotically optimal, 77® does significantly better 
than 77^ in the non-asymptotic regime, but requires roughly 
£/s observations on average. To see this, note that EX ® w I, 

and since X t — X t -i — s, we have E77® ss £/s by Wald's 
equality — the approximations become equalities if we ignore 
excess over the boundary (variously known as "overshoot"), 
i.e., that X ® may exceed £. 

Concerning the fixed time estimator 77 = £/s, later it is 
shown (see paragraph after Lemma Q3 that 



lim 



nn-£/ s \ p 



p/2 



(8) 



£->oo C\{£, S,£,p) 

which is always greater than 1. Hence 77 = £/s is always 
suboptimal, and in particular for small values of the noise pa- 
rameter e. As e increases, the observation process Y becomes 
noisier and ultimately useless in the limit e — > 00. In this 
regime the fixed time estimator £/s is optimal. In the example 
of Fig. Q] the right-hand side of (JHJ is equal to \/5- 

B. Delayed observations 

Consider the observation process 

Y-. F = o,y 1 = o,...,y d = o Y t = X t _ d t>d + l 

for some fixed positive integer d > 0. 

Given d > 0, £ > 0, and s > 0, define the stopping rule 

77* = inf {t >Q:Y t >£-s-d}. 

Notice that 77^ is a very natural candidate for estimating ti 
since, on average, X t is s ■ d higher than Y t . In fact, the 
following two theorems establish optimality of 77^ for any 
s > 0. 

Theorem 3 (Delayed observations, s = 0). For s = 0, £ > 0, 
and p > 1/2, 

miE\ri-T i \ p = d p =E\ri* d -Ti\ p . 

v 

Instead, when the drift is positive we have: 

Theorem 4 (Delayed observations, s > 0). For s > and 

p>\, 

inf E|77 - t £ |p = (1 + o(l))E|77^ - t^ 

V 

= (l + o(l))C 2 (d,s,p) 
as d — > 00 while £ — £(d) > s ■ d, where 

m d p / 2 

C 2 (d 7S7 p) d ^^- F E\N\ p . 

In Theorem [4] note that £ need only be greater or equal 
than s ■ d, and there is no other growth rate constraint of £ 
with respect to d. 
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Fig. 2. E|r?* - T e \ p /C 2 (d, s,p) as a function of d with I = 100 + s ■ d, 
s = 1, p = 1. 



Also, notice that r/^ is uniformly optimal over p > 1, 
similarly as 77® and 77^ for noisy observations. However, by 
contrast with 77® and 77^, optimality of rj* d is only with respect 
to stopping times, not with respect to arbitrary functions of 
Yq°. Indeed, if 77 can be an arbitrary function of y o °°, then we 
can set 77 = Tg and so achieve E|?7 — Tg\ p = — in this case 77 
is no more a stopping time with respect to Y since causality 
is violated. 

Finally, note that for s — we have P^ < ri) = 0, i.e., it 
is optimal to wait until it is certain that X reached level I, and 
the corresponding estimation error is equal to <P . By contrast, 
the estimation error grows as (Pi 2 for s > 0. Thus, when 
s > 0, were we to impose the additional certainty constraint 
P(?7 < Tt) = 0, the price to pay in terms of estimation error 
would be a multiplicative factor of the order of d p l 2 . 

Fig. |2] represents a numerical evaluation of 

Mz3i! (9 ) 

C 2 (d,s,p) 

as a function of d with I — 100+s-d, for p = 1 and s = 1. The 
function is roughly equal to 1, in agreement with Theorem |4] 
The small oscillations around 1 are due to our simulation 
which evaluates (O with a finite number of random samples. 
Here this number suffices to guarantee a precision equal to 
5 = .03. Simulation details are provided in the appendix. 

C. Continuous time 

Theorems Q] [5] and |4] remain valid if we replace X and 
Y by their continuous time counterparts; i.e., 

X t = s ■ t + B t 

and either 

Y t = X t + eW t 

for noisy observations, or 

Y t = X t _ d 

for delayed observations, where 

{B t } t >o and {W t }t>o 



are independent standard Wiener processes. The proofs of the 
results in continuous time are omitted since the arguments 
closely follow those in discrete time and often get simplified 
as there is no issues related to barrier overshoot. 

III. Proofs 

In this section we prove first Theorems [2] and |4] then 
Theorem [3] To prove Theorems |2] and 2] we often use the 
following Lemma, whose proof is deferred to the end of this 
section, on the concentration of Tg around its mean: 

Lemma 1. Let St = 'Yln—i %i where Z\,Z2,--- are i.i.d. 
Gaussian random variables with mean < S < 00 and 
variance < a 1 < oo. Let < I < oo and let 

/i = ia£{t >l:S t >£}. 

Then, 
i. the following inequalities hold 



F(p < l/s -z)< exp 
for < z < £/s; 

P (/j > e/s + z) < exp 



2a 2 (e/s-z) 



s 2 z 2 
2a 2 (£/sl 



(10) 



(11) 



for z > 0; 
ii. for any p > 



E 



»■ 



^hiki + iy' 2 



(12) 



where < k\ , k% < oo are constants that depend on 
p, s,a 2 but not on £; 
iii. as £ — > oo, 




AA(0, 1) 



in distribution. 



Claim iii. of Lemma Q] implies ([8]). To see this, let Tg be the 
first time process X reaches level £. Claim iii. of Lemma Q] 
then gives 



E\ n -e/s\ p = (i + o(i)) 



fp/2 

- ■ 

„3p/2- 



l\N\ p (t->oo) (13) 



where N - 7V(0, 1). This establishes ®. 

The following basic fact is repeatedly used in the proofs of 
Theorems [2] and |4] 

Fact 1. Let (S, Q) be two arbitrary random variables. Then, 
inf E\ri-f(S)-g{S)-h(Q)\ p = inf E\ V -h(Q)\ p p > 

I)(S) tj(S) 

for any functions g(-) and h(-), and any function /(•) such 
that f(S) > almost surely. 

To see this, notice first the obvious inequality 
inf ' E|r?/(5) - g{S) - h(Q)\ p > miE\ v - h(Q)\ P ■ 



To see that 

inf E\ V f(S) - g(S) - h(Q)\ p < inf E\ V - h(Q)\ p , 

observe that for any i] = r](S) one can find fj = fj(S) such 
that 

fjf(S) - g(S) = r, 

almost surely since f(S) > almost surely. 

To illustrate FactQ] consider the following simple example, 
variations of which appear in the proofs of Theorems |2] and 

El 

Let X = Y + Z where X and Y are arbitrary random 
variables. Then, for any c > 

inf Eln - c- X\ p = c p inf Eln/c - X\ p 

n(Y) r,(Y) 

= c p inf Eln/c - Y - Z\ p 
n(Y) 

= c p inf E\rj - Z\ p , 

T)(Y) 

where the last equality follows from Fact Q] with S = Y, 
Q = Z, f(S) = 1/c g{S) = S, and h(Q) = Q. 

We now prove Theorems [2] and [4] then Theorem [3] Through- 
out the proofs, N always denotes a zero mean unit variance 
Gaussian random variable. 

A. Proof of Theorem [2] 
We first show that 

inf E\r)-n\ p >{l + o{l))Ci(t,s,e,p), (14) 

T,(Y°°) 

where Ci(£, s, e,p) is defined in Theorem 12 then show that 
E\rj — ti | p is equal to the right-hand side of ([Pil l for r\ = iff 
and r\ = r\\. Before proceeding formally, we outline the main 
arguments. 

To show (fl4l >. the main idea is to reduce the minimization 
problem of estimating Tg to the one of estimating process X 
at an instant close to £/s, the expected time X reaches level 
I. To do this reduction, let t* be such that t* ~ £/s while 
satisfying P(r^ > t*) « 1 — one such instant is the t* defined 
in ©. It then follows that 



Te ~t* + 



(15) 



since the time it takes for X to go up by q > is q/s plus 
some small Gaussian term, by Claim iii. of Lemma Q] From 
(fT5T l. the fact that Y t * is a sufficient statistic for X t *, and that 
t* is close to t/s, one can show that 

inf E\r)-n\ p >(l + o(l))\ inf E\r) - X t *\ p (16) 

where the infimum is over estimators that depend only on Y t * . 
Since (X t * , Y t * ) are jointly Gaussian, for all p > 1 the 
infimum on the right-hand side of ( fT6] > is achieved by X t *, 
the mmse estimator (01 of X t * given observation Y t *. It then 
follows that 

p/2 

E|7V| P 



inf E|?7-X t *| p 



£c 2 



S (l 



which, together with (fTSI l, gives ( TT4b . 

To achieve the right-hand side of ( TT4] >. it is natural to 
consider the stopping time 



% = t + 



(e-x t *)+ 



(17) 



which is similar to the right-hand side expression of (fTSt . 
except that X t * is replaced by its (optimal) mmse estimator 
X t (the discrepancy due to the rounding in ([P71 plays no role 
asymptotically). 

This stopping time is in fact optimal since the moments of 
''It ~~ T i coincide with the right-hand side of (TPfl i. asymptot- 
ically. Finally, since X t is the best estimator of X t , iff also 
represents a natural candidate since it is based on sequentially 
estimating X in an optimal fashion. 

We proceed with the formal proof. 
Lower bound: Fix p > 1 and fix an integer t > 1 — later we 
take t = t* defined in ©. 

Then, 



( inf E|r/-r^| p ) 1 / p 
r,(Y°°) 



> 



inf E 
inf E 

A Y o°) 



n-t 



e-x t 



i] — t 



l-Xt 



- y n - 1 
1/p 



l-X t 



1/p 



© — 



inf E 

v(Y t ) 



v -t- 



- |E 
X, 



n-t - 
p\ i/p 



t-X t 



i/p 



E 



T( — t - 



x t 



P\ !/P 



(18) 



where the inequality holds by the triangle inequality, and 
where the last equality holds since Yt is a sufficient statistics 
for X t . 

Since (Xt,Yt) are jointly Gaussian, 



Xt=Xt 



ts z 



l + e 2 



1/2 



N, 



(19) 



where Xt is the mmse estimator of Xt given observation Yt 
defined in (O, and where N ~ A/"(0, 1) is independent of .Xt. 
Hence, 



inf E 

»7(n) 



»?-*" 



£-X* 



= — inf Ehis-ts-l-Xtf 
= — inf E\n-X t \ p 

S p V (Y t ) " ' 



.s' J 



-E 



X t -Jf, 



te 2 



s 2 (l 



p/2 



E|A^| p . (20) 



The second equality follows from Fact Q] The third equality 
holds since the mmse estimator of X t minimizes the average of 



any absolute moment with respect to X t . The fourth equality 
holds by (Qjj). 

We now upperbound the second term on the right-hand 
side of ( TT~8T >. As we shall see, compared to the first term, the 
contribution of the second term is negligible when t = t*. 

We have 

E\(r e -t)-(£-X t )/ S \ p 
= E(\(T i -t)-(£-X t )/s\ p - : n<t) 

+ E{\( n -t)-(£- X t )/s\ p ; ri>t). (21) 

For the first term on the right-hand side of (ETl . 

E(\( n -t)-(£-X t )/s\ p ;n <t) 

<E((t + l/s+\X t \/s) p ;T e <t) 

< [E(t + l/s + |X t |/s) 2p P(r^ < t)] V2 (22) 

by the triangle inequality and Cauchy-Schwartz inequality, 
respectively. 

For the second term on the right-hand side of (12TV 

E(\( n -t)-(£-X t )/s\ p ;n >t) 

= E (\{ n -t)-(£- X t )/s\p\ n > t) P( n > t) 



< E 



(E(\( n -t)-{e-x t )/s\v 
< fciE (k 2 + (£- X t ) + y' 2 ) 



X t ,n >t 



Ti > t 



(23) 



where the second inequality follows from Claim ii. of 
Lemma [TJ and the strong Markov property of X at time t, 
with fci, &2 > being constants that depend only on p and s. 
Combining dHJ, ©, (ED, (E3, and (|23) yields 



( inf nv-n\n 1/p > [^(jt 

V(Y °°) \sP V(l 



te z 



■£' 



P/2 \ l / p 

E\N\ P \ 



-([E(t + £/ s + \x t \/ s ) 2p P(r l <t)] V2 

+ fc 1 E(fc 2 + (£-X t )+f /2 ) 1P . (24) 

Finally, letting t = t* where t* is defined in (0, we have 

P(t£ < £*) < exp(-fi(£ 2? - 1 )) 
by Claim i. of Lemma [TPl Therefore, 
E(«* + */a + \X t * \/s) 2p V{n < t*) = o(l) ( 



oo 



(25) 
since 

Xt* =s-t^ + (i*) 1 / 2 7V. (26) 

From d26i > and <(3j we also get 

E {hi + {£- X t ,) + f /2 = 0(£ qp ' 2 ) 

= o(£ p/2 ) (27) 

since q < 1. From d24l i with t = t*, d25l l, and (|27| i we get 

is 2 ^ " "' 



inf E\r]-Te\ p > (1 + o(l)) 



f(>o c 



s 3 (l+e 2 ) 



E|A^ 



(28) 



as £ — ► oo, yielding the desired result. 

Next, we establish the asymptotic optimality of 77® and 77^ 
by showing that their absolute moments with respect to Tg is 
equal to the right-hand side of J28l i. The proof of optimality of 
77® uses most of the arguments of the proofs of [2. Theorem 
2.1], which establishes optimality of 77® for p = 1, together 
with some of the arguments used to establish optimality of rfe. 
Achievability, 77^: To simplify exposition, we ignore discrep- 
ancies due to the rounding of non-integer quantities as they 
play no role asymptotically. In particular, we assume that 77J 
is given by 

s 

without rounding the fraction^ Notice that if 77^, as defined 
above, is asymptotically optimal, then a triangle inequality 
argument immediately shows that 77^ with the rounding of the 
fraction is also asymptotically optimal. 
Let 



A def „ +* 

A = T£ - t 



and let 



Then, 



def 



X 



t*) + 



/a. 



(29) 



(30) 



E\ V * t -T i f = E(\A-A\P;n>t*) 

+ E(\ V *-n\ p ;T i <t*). (31) 

For the first term on the right-hand side of (|3H , 

E(| A - A|P; n > t*) < E(| A - A\ p : X t . <£,n> t*) 

+ E{\A\P;X t *>£). (32) 

By the triangle inequality, 

(E(\A-A\P;X t * <£, Ti > t*)fl p 

<(E{\A-(£-X t *)/s\ p ;X t * <£,T t >t*)fl p 
+ (E(|(* - X t .)/s - A\ p ; X t . <l,r e > t*)) l ' p 
<(E(\A-(£~X t <)/s\ p ;X t < < £)fl p 
+ (E(\(£ - X t *)/s - A\ p ; n > t*)f/ p . (33) 

For the first term on the right-hand side of (l33~t . 

E(\A~(£~X t .)/s\ p ;X t . < £) 

= E(\(X t .-X t ,)/s\ p ;X t , <£) 
<E\(X t .-X t ,)/s\ p 



L ( t*e 2 ^ p/2 



E\N\ l 



(34) 



7 f2() refers to standard order notations, see, e.g., J3] Chapter 3]. 



sP \ 1 + e 2 

where the last equality follows from JT91 . 

For the second term on the right-hand side of (l33l we use 
( l23l with t = t* to get 

E(|(* - X t *)/s - A\ p ; n > t*) < fciE (fe + (£- X t * )+f' 2 

(35) 

where h\ , fc 2 are constants that depend on p and s only. 

8 As such, rj^ is no more a stopping time, strictly speaking. 



For the second term on the right-hand side of (l32l , Cauchy- 
Schwartz inequality yields 

E(| A| p ; X t * >£)< (E(| A\ 2p ))^ 2 F(X t i. > £) 1/2 . (36) 

By the triangle inequality, 

(E|A| 2p ) 1/2p < {E\t £ - £/s\ 2p ) l ' 2p + {E\£/s - t*\ 2p ) l/2p 

<h{k2 + e) 1/2 + (t/sy, (37) 

where for the second inequality we used Claim ii. of LemmaQ] 
with fci , k 2 constants that depend on p and s, and the definition 
of t* (recall that we ignore discrepancies due to the rounding 
of non-integer quantities). 

From d32j, (03), <|34), (05), <|36), and (|37) we obtain 



Achievability, 77®: We write E|?7® — Tg\ p as 

K\nT-Tt\* = •&§*?- Ti\ p \v?>n) 

+ n\n-vf\ p ;r i >r,f), (41) 

and upper bound each of the two terms on right-hand side 
of the above equation. As in the previous section, we ignore 
discrepancies due to the rounding of non-integer quantities as 
they play no role asymptotically. In particular, we treat i/s as 
an integer. 
Letting 

def 



E(|A-AP>;t£>0< 



*ji 



t*e 



s p V(l + £ 2 ) 



E\N\ P ) 



v = inf{i > : X n +t > £} 



we have 



i/p 



+ {k l E{k 2 +{i-x t ,) + Y' 2 )y 

+ [ki{k 2 + £) 1/2 + (l/s) q ] p F(X t * > I) 1 ' 2 . (38) 

For the second term on the right-hand side of (l3"Tt . using 
Cauchy-Schwartz inequality and the triangle inequality we get 

E(\ n -rii\ p ;n<t*) 

< (E\r e - £/s\ 2p ) 1/2p + (E\£/s - r]* e \ 2p ) 1/2p } P P{r e <t*) 1/2 

< \k 1 {k 2 +£) l l 2 + k z {k A + l) l / 2 \ P E{n<t*) 1 ' 2 (39) 

where for the second inequality we used Claim ii. of LemmaQ] 
with fc 3 , &4 constants that depend on p, s, and e. 
Combining <ED, (ED, and © 



n\ri?-n\ p ^f>n) 

<E{v p -X Tl <£) 

[E(£-X Te ) p ;X Te <£)] 1 / p / i 



< 



< 



+ [E\u-(£-X Te )/s\ p ;X Te <£)] 1 / p 
[E{X Tl -X Tt f + Y/ p /s 
+ lE\v-(£-X Te )/s\ p ;X Tt <£)} 1 / p 



(42) 



nvt-T t \ p < 



t*e- 



p/2 \ VP 

E|iV|H 



sp V( 1 + e 
+ (k 1 E(k 2 + {£-X t >) + ) p l 



i/p 



+ [*i(*a +£) 1/2 + (^/s) 9 ] p P(l t * > £) 1/2 

+ [fci(fc 2 +^) 1/2 + M*4 +^) 1/2 ] P P(r, < t*) 1 ' 2 . 

(40) 

Using (0 and Claim i. of Lemma Q] one deduces that the 
third and fourth terms on the right-hand side of (PRTt tend to 
zero as £ -> 00. Since X t * =«'t* + (i*) 1 / 2 ^ and i* 
{t/s)(l + o(l)), we conclude that 



where the first inequality follows from the definition of X t 
(see (ff)) and where the second inequality follows from the 
triangle inequality. 

We upper bound the two expectations on the right-hand side 
of g2}. 

For the first term, for i > 1 let 



Ui — (Xi — Xi) — (Xi_x — Xi-x) 

^{e 2 /{l + e 2 ))V t -(e/{l + e 2 ))W l 
1 £ - N 

(l + e 2)l/2 • 



(43) 
(44) 



Then0 



E\r,l-n\ p < (l + o(l)) 



£e 2 



s 3 (l + e 2 ) 
= (l + o(l))Ci(^*,e,p) 



P/2 



E|7V| P 



as £ — !• 00, where 



d(£,s,s,p) 



def 



fe 2 



p/2 



EI/YI 



.■s 3 (l+e 2 ), 
This establishes the asymptotic optimality of r\\. 



£/s £/s 

X Tl -X Ti =J2Ui-t{n<£/s} £ Ui 

i—1 i— T£-\-l 

+ t{n >m} J2 u *> < 45 > 

i=l/s+l 



1 1 {A} denotes the indicator function of event A- 



and, by the triangle inequality] n \ 



\E(X Tt -£ Tl f + ]V*< 



5> 



1/p 



+ 



t{n <i/s} J2 u > 

i=Tg-\-l 



E(l{n>£/s} J2 Ui 

i=t/a+l 



1//' 



I//-' 



(46) 



We bound each term on the right-side of (|46V For the first 
term, from (l44l we have 



t/a 



E \J2 Ui ) = (W^V(l + e 2 )) p/2 EN p 



(47) 



For the second term on the right-side of (l46l , using (PHI to- 
gether with the fact that r^ is independent of U Ti +i, U Ti +2, ■ ■ ■ 
we get 

*/« 
e(-1{t<<£/«} ^ f/,) P 

i=Tf + 1 

= E[(£/s - n)+e 2 /(l + e 2 )f / 2 E Nl 

< {E\e/ S - Te \^ 2 )ENl 

<k 1 {k 2 +£) p/ ' i EN p + 

= 0{£ p/A ) , (48) 

where for the first inequality we bounded e 2 /(l+e 2 ) by 1, and 

where for the second inequality we used Claim ii. of LemmaQ] 

For the third term on the right-side of (l46l . using (143V the 

triangle inequality, and by upperbounding £ 2 /(l + e 2 ) and 

e/(l + e 2 ) by 1, we get 

i/p 



We now focus on the second expectation on the right-side 
of (l49l i. Since, on {rg > £/s}, we have 

J2 Vi = (X Tt - X e/S ) - s(r e - l/s) , 
i=e/ s +i 

we consider the shifted process {Xt — Xg/ s }t>g/ a and its 
crossing of level £ — Xi/ S . It then follows that 

E(t{ n >£/s} jr Vl y + 

i=£/s+l 

= s p E( [(X Te - X e/s )/s - fa - £/s)] p + ; n > £/s, X e/s < £) 

< s p E (\(X Ti - X e/s )/s - (r e - £/s)\ p \n > t/s,X i/s < £) 

< fciE(fc 2 + (X Te - X e/s )+) p / 2 

= 0(£ p/i ) (51) 

where k\ , k 2 are constants that depend only on s and p, and 
where the second inequality follows Claim ii. of Lemma Q] 
and the Markov property of process X at time £/s. We now 
justify the second equality in ( BTT l. We have 

d „ 



X 



and 



t/a 



x T 



£/sN 



where e Tf denotes the excess over the boundary at time re. 
Using this and the triangle inequality we get 

2/p 



(E(X Te X e/S )f) P < (Ee p { 2 ) 2 / p + ^£Ts(EN p + /2 )y p , 

(52) 
which implies that 

E(X n - X e/S ) p ^ 2 = 0(£ p ' A ) 

rt/2 

since Eev, can be upper bounded by a finite constant that 
is independent of £ ([7 Equation (2)]). This establishes the 
second equality in (IBTT l. 

Combining (|49b together with ( T50b and ( BTT l yields 



{l{r e >£/ S } J2 U i) + 

i=£/s+l 

T 

<E((l{Tt>£/8} J2 Wi ) + 

i=i/s+l 

T 

+ (e(i{t, > e/ s } J2 ^)+ 

i=i/s+l 

Since Tg and {Wi} are independent, we have 



i/p 



i/p 



l{r>£/ S } ^ C/ 4 ) = 
From (06), (07), (08), and (|53j we get 

!(X Ti -X T J^< (1 + 0(1))' '-"" 



0{fi 



p/4s 



(53) 



(49) 



p/2 



EJV?. (54) 



l{r/ >*/s} E Wi = y/(T t -e/s)+N, 
i—e/s+i 

and a similar calculation as for (l48l shows that 



E(l{r, >^/ S } J2 WiY=0{P'*). 



,s(l + e 2 ), 

For the second expectation on the right-hand side of (l42l 
we have 

E\v -{£- X Te )/s\ p ; X r , < £)] < k 3 E[k 4 + (£ - X Te )+] p/i 

= 0{£ p/A ) , (55) 

where the inequality follows from the strong Markov property 
of X at time ti together with Claim ii. of Lemma Q] with k$ 
and k± constants that depend on s and e. 
From (02]i, d54]l, and (|55]l we get 



(50) 



i=£/a+l 



l(\vf-Ti\ p ;vf>Tt)< (l + o(l)) 



(z 1 



S 3 (l + £ 2 ) 



p/2 



°By x^_ we actually mean (x_|_) p . 



E7VJ. 
(56) 



Using analogous arguments as for establishing (|56*T l, which 
essentially amounts to swap the roles of X and X and the 
roles of T( and r/|, we get 



\n-V?\ p m>V?)<(l + o(l))L^ 



£e 2 ^ " " 



, , ENl. 
+ e 2 )J + 

(57) 



Finally, from (|4TJ, (J56j, and (|57j we get 



The second equality in ( 1591 follows from Fact[T] The infimum 
on the right-hand side of the third equality is over estimators 
that depend on X v only, since 5 is defined over X„, X„+i, .... 
The last inequality holds for an arbitrary fixed constant c > 0, 
with e„ defined as the excess at time v, i.e., 

e„ = X v - (£ - s ■ d(l - e)) > . 
Take d large enough so that 



fe 2 ^ ' 2 



n-n-tir < a+ (i)) Vs , (l ;2) 

which establishes the asymptotic optimality of iff 



E\N\ P (£->oo), 



sde > c . 



(60) 



B. Proof of Theorem [5] 

As mentioned earlier, rf d is a very natural stopping time to 
consider since, on average, X t is s ■ d higher than Y t . Now, 
the time needed to go from level £ — s ■ d to level £ has (ap- 
proximately) the Gaussian distribution d+(yfd/s)N by Claim 

hi. of Lemma [T] Hence we have ti — -q* d « (Vd/s)N which 
yields the second equality in Theorem |4] The optimality of r/2 
is established essentially by showing that any (asymptotically) 
optimal stopping rule shouldn't stop later than rf d . 
Lower bound: Let £ be any function of d such that £ > s ■ d, 
and fix integer d > 1. Further, let 

v = mi{t > : X t > £ - s ■ d(l - e)} 
where e is a constant such that < e < 1 — later we take 

£^0. 

Then, 

infE|77-Ti| p >miE(\rj-T e \ p ;T e <v + d) 
■n ' n 

> inf E(|?7-Tf| p ;Tf < v + d) 

7?(V + d )<,y+d 

= inf ^{\i]-n\ p ;n<v + d), (58) 

where the infimum on the right-hand side of the second 
inequality is over all estimators that depend on Yq + (these 
estimators need not be stopping times), and where the equality 
holds since Y t — X t -d- 
Let 

. def . 



8 = inf {t > : X v+t > £} , 



so that, by definition, 



Then, 



T£ = V + 8 . 



inf E(\ri~Te\ p ;Ti < v + d) 

inf E(|77-(^ + (5)| p ;0<(5<d) 

v(X»)<u+d 

inf E(|?7 - 8\ p ; < 8 < d) 

n(x%)<v+d 

inf E(\ti-8\ p ;0<8 <d) 

V (X v )<v+d 

> inf E(\7i-S\ P ;Q<5 <d) 

ri{X v ) 

> inf E(J77 — 8\ p \0< 8<d,e u < c) . (59) 

n(x v ) 



def 



d v = d(l — e) + e v /s . 



and define 



and define the functions fi(d,e) and f2(d,e) as 

fi(d,e) = sy/d(l - e) , 
and 




h(d,e) 



def 



sde — c 



y/d(l - e) + c/s 

Notice that both f\ and fi are strictly positive because of (|60] |. 
Using the definitions of d v and X v we get 

inf E(|77 — 5| p ;0 < <5 < d, e^ < c) 

n(x v ) 

= inf E(\r,-(8-d v )\ p ;£.i) 

v(x„) 

> inf E(\ V -(8-d„)\*>;£2) 

v(x„) 



> 



(d(l - s)) p/2 

S p 

{d{l-e)) p ' 2 

sp 



inf E [\r 1 yJs 2 /d~-N u \P-E 2 
■n(x v ) V 

inf E(\r,-N u \ p ;£ 2 ). 

V(Xu) 



(61) 



where we defined the events 

def 



£1 =' {— s\/d v <N V < s(d — dv)/\/d v ,e v < c} 

£2 = {-/i(d, e) < N v < f 2 (d, e),e v <c}. 

The first equality in doTl i holds by Fact [T] The first inequality 
holds by the definitions of f\(d, e) and f 2 {d, e) and by noting 
that, on {e„ < c}, the range of N^ in £1 contains the range 
of N v in £2. The second inequality holds by the definition of 
N v and because on event £2 we have 

d v > d(l - e) . 

Finally the last equality in d6TT l holds by Fact Q] since d v is a 
function of X u (through e„). 

Since fi(d, e) and f 2 (d, e) are increasing functions of d, let 
us pick d so that the following inequality, more stringent than 
(l60l l. is satisfied 

c < minjsefc, /i(d,e), / 2 (d, e)}. (62) 

It then follows that 

E(|T ? -JV 1/ |»;-/i(d,e)<JV„</2(d,e),e I/ <c) 

> E (I77 - iV^P; -c < N v < c, e v < c) , 
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hence, from doTT i, 



inf E(\r]-S\ p ;0 <5<d,e„<c) 



(d(l - e))^/ 2 „(*„) 

> inf E^-iV^-c^ 7V„ < c, e w < c) 

»y(X,) 



and 



Further, define 



n = £• 



def . 



inf E ( \n - N u \ p ; -c < N v 



< c 



e„ < c 



D (e„ < c) . 

(63) 

Now, Ee„ can be upperbounded by a constant < k < oo that 
is independent of the barrier level at time v, i.e., I — sd(l — e) 
(see [7. Equation (2)]). Hence, 

P(e„ < c) > 1 - fe/c 

by Markov inequality. Therefore, for any fixed < e < 1, c 
large enough so that 



m£{t > : X t+i - X n > sd} . 

Notice that if there were no barrier overshoot at time £, then 
X^ = £ — s ■ d, and so A would be equal to A. 
It follows that 

E\if d - n \ p = E\A-d\ p 



< 



(E|A -dH 1 /P + (E|A -A|P) 1 / 
(E|A -d|P) 1 /P + (Er e P) 1 H P 



(66) 



where 



def 



k/c < £ 

and d large enough so that J62l holds, from 

1 s p 



(64) 



X£-(t-s-d) 



we have 



(1 - e) (d(l - e))p/ 2 v 



inf E|?7 — T(\ 



> inf 

r){X v ) 



(\ri-N„\ p ;-c<N v <ce v <c) 



denotes the excess at time £. The first inequality in ( 1661 1 follows 
from the triangle inequality and the second inequality follows 
from the strong Markov property of X at time £. 

From Claim iii. of Lemma[T]and the strong Markov property 
of X at time £, 



For a fixed value of e„, N v — > N by Claim iii. of Lemma 
[T] and by the strong Markov property of X at time v. Hence, 
N v — > N uniformly over {e^ < c}. Therefore, taking 
lim inid^oo on both sides of the above inequality we get 

1 s p 

lim inf — ,, ,„ inf El n — re \ p 

d^oo (1-e) (d(l-e))p/ 2 v " ' 

> infE(|e-7V| p ;-c< N < c) 

e 

>E(\N\ p ;-c<N<c) (65) 

where the infimum on the right-hand side of the second 
inequality is over constant estimators, and where the last 
inequality follows from the symmetry and monotonicity of 
the probability density function of N around zero. 

Since the above inequality holds for arbitrary < e < 1 
and c > such that d64l i is satisfied, by letting c = c(e) — k/e 
and by taking e — > on both sides of (l65T l yields 



E|A - d\ p 



(1 



d p / 2 
o(l))^E\N\ p 



si' 



(67) 



lim inf — ■ F rin{E\'n-Te\ p > E\N\ P , 

d—too dP' V 



implying that 

d p / 2 
lnfE\ V -T £ \ p >(l + o(l)) — 
V s p 

as d — > oo while £ > s ■ d. 
Achievability: Let £> s ■ d and define 

def . 



E|7V| P : 



as d — > oo. 

Assume that Er e£ p can be upper bounded by a finite 
constant that does not depend on d. Then, from d66i l and (|67| > 
we get 

E\r,* d -n\ p <(l + o(l))^E\N\P 

as d — >• oo while I > s ■ d, yielding the desired result. 

As we now show, the fact that Er e!i p can be upper bounded 
by a finite constant that does not depend on d essentially 
follows from Q Equation (2)] which states that Ee^ p can 
be upper bounded by a finite constant that does not depend 
on the barrier level at time i]. For notational convenience, we 
drop the subscript £ and write e in place of e^. 

If the barrier level at time rj, i.e., (£ — s ■ d), is bounded in 
the limit d —> oo, i.e., if limsup d ^ oc (£ — s ■ d) < oo, then 
clearly Er e p can be upper bounded by a finite constant that 
does not depend on d. 

Now, suppose that lim £ ;^ 00 (£ — s ■ d) — oo, and suppose, 
by contradiction, that Er e p — > oo. We start with p = 1. 

By Claim ii. of Lemma Q] we have 



e \i/2 . 



(68) 



'/,/ 



inf {t >0:Y t >£-s-d} 



def 



£ = inf {t >0:X t >£-s-d}, 



and 

These definitions imply that 



A = inf {t > : X t+r) > 



V*d = Z + d, 



where N e — > N in distribution, uniformly over {e > k}, as 
k — >• oo. Using this, 

Er e < E(r fc ; e < k) + E(r e ; e > k) 

<E(r fc )+E(e/.s+ [e/s 3 ] 1/2 A> e ; e > k) 

< E(n) + (l/.s)Ee +E(N e [e/s 3 ] 1 / 2 ) 

< E(T fe ) + (l/s)Ee + s- 3/2 [(Ee)E(JV e ) 2 ] 1/2 

< E(T fe ) + (l/s)Ee + s- 3/2 [(Ee)(2EA^ 2 )] 1 / 2 . (69) 
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The first inequality holds since rt > T& for £ > £'. The second 
inequality follows from (1681 , The fourth inequality holds by 
Cauchy-Schwartz inequality. The last inequality holds by (|68l l 
for k large enough. 

From d69| i, if Er e — > oo then Ee — > oo, a contradiction since 
J7] Equation (2)] says that Ee admits a finite upper bound that 
does not depend on the barrier level. Hence, Er e — > oo can be 
upper bounded by a finite constant that does not depend on d. 

For p > 2, a similar argument as above shows that Et p < 
oo. In particular, a similar computation as in (l68l holds, with 
the addition of a triangle inequality for the second inequality 
in d68) to get 

E(r e p ; e > k) < ((l/s)(Ee p ) 1 / p + (E(N p [e/s 3 ] p / 2 )) 1 / p ) P . 

This shows for any £ = £(d) > s ■ d, limsup^^^ Erf < oo, 
yielding the desired result. 

C. Proof of Theorem \3\ 

Fix p > 1/2. Suppose for the moment that a stopping time 
7] on Y that satisfies P(r] < rt + d) > also satisfies 

n\ri~n\p\Y rn i 1 <n + d) = ^. (70) 

Hence, if r\ satisfies IE 1 77 — Te | p < 00, then necessarily 

F(ry > n + d) = 1 . 

From this equality if follows that 

infE|7 ? -r £ | p = inf % - n\ p 

ri ri:F(ri>Tt+d) = l 

>d p 

= m-n\ p 

where rf d = ini{t > : Y t > £}. Therefore we have the 
desired result 

inf E|t7 - n \ p = d p = E\r/^ -~ n\ p . 
n 

We prove (l70l assuming P(t7 < Tg + d) > 0. Equivalently, 
we show that for any stopping rule 77 over X (instead of Y) 
such that P(?7 < n) > 0, necessarily we have 



where Q(x) — (l/y/2n) J exp(—x 2 /2)dx. Hence, 



E(\ V - Tt\ p \X n , V < ti) = «) . 



(71) 



Eff = 2 / t p dQ 




h 



m 



t p 

2^7 1^ 2 





^e- h '' 2t dt 



he-' 1 / 2 f t p , 

> ;=- / -77^ dt. 



2^ J VI 2 

h 

Therefore, if p > 1/2, then Ef£ = 00, yielding the desired 
result. ■ 



D. Proof of Lemma Q] 

Claim i. For any real constant q, St — Yli=i ^i satisfies 

E [e 9S ' +1 |5i, ...,S t ] = e i s t+i s +i 2 ° 2 / 2 

which can readily be checked by direct computation. 
Hence, letting 

M t = e qSt - rt t>\ 

where r is an arbitrary constant, we get 

E [M t+ i|Mi, ...,M t ]= M t e qs+q2,72/2 - r t > 1. 
Let us set r = qs + q 2 a 2 /2 so that 

M t = e lSt-(qs+q 2 v 2 /2)t t > 1 

is a martingale, and introduce the stopping time 

U = min{ \k] , n} 
where k > is an arbitrary constant. It follows that 

1 = EMi 



= EM^ 
>E[M Tl ;n < k] 



>e l 



e-(qs+q 2 cr 2 /2)kT l 



(re<k) q>0 



-(qs+q a /2)fe p ( T( < ^ 



Given X v = £ — h, for some arbitrarily fixed h > 0, let 
{Bt}t>a be the continuous time version of X starting at time 
rj, i.e., {B t } t >a is a standard Wiener process starting at time 
r\ at level Bq = £ — h and such that B t = X v+t for t = 
0,1,2,.... 

Let 

f h = mi{t>0:B t =£}. 

Suppose j] < T(. Since fh < T( — r), had we proved that 
Ef h p = oo, (EB would hold. 
From the reflection principle 

P(f ft <t) = 2P(B t >h) = 2Q (—=\ h>0 7 t>0, 



where the second equality follows from Doob's stopping 
theorem and where the second inequality is valid for q > 
since S Te > £ and Te < n. 
It follows that 

p (n <k)< e -^+(qsW^n)k q > Q (72) 

Minimizing the right-hand side of (l72t over q > gives 

P(r, <fc)<e-^ sfc ) 2 / 2ff2fc , (73) 

which is obtained for q = g(fc) = (£ — sk)/a 2 k. Note that 
this bound is valid for k < £/s since q should be nonnegative. 
By assumption k > 0, so inequality ( |ToT > follows from ( |73l by 
letting fc = £/a - z, < z < £/s. 

Inequality ( fTTT > follows from Chernoff bound. 
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Claim ii. Using Claim i. and letting u = £/s, we have where < fc 2 < oo is a constant that depends on s, p, and 

^ a 2 . From <|74), (|75j, and d76t 

E|t £ -£/s| p = [f(\ti-u\ >z)d{z p ) E\n-e/s\ p <k 3 (k 4 +e p/2 ) 



for some constants A3 and &4 that depend on s, p, and a 2 . 

< 9 1 P -s^-,(2^(u +z)) ,, P , This y ields the ^ sired result - 

- 2 J e d ( z > Claim iii: See 15, Theorem 2.5]. ■ 



< 2 / e -s 2 2 2 /(4<x 2 maxK Z }) d( < z p) 



II 



Acknowledgments 

The authors are grateful to the reviewers and to the Asso- 
ciate Editor for their insightful and detailed comments on the 
2(ii + I2) (74) manuscript, and for questionning the non-asymptotic behavior 

of r\\ which prompted the investigation of the complementary 
© 
1 ■ 
The authors are also indebted to Milad Sefidgaran for many 



where the first inequality follows from Claim i. and where stonnins rule n 



oo Appendix 

I 2 = / e~ s2z ^ 4: ' j2 ^d(z p ) . Simulation - noisy observations: To numerically evaluate 

J © for 77 = {fff,Tf^,£/s}, for each given value of I we gen- 
erated n samples of (X, Y), and computed the corresponding 

For Ji, the change of variable empirical sums 

//I 2 «\ !/ 2 1 " 

yields where MfjjTtii)) is the value of (??,r^) for the i-th sample 



, [.v/c^fl]-/' of (x y)H! 



Jl= V^-J ./ e ° ^ C 1 &p) = C 1 {t,e,s,p) 

be the constant defined in Theorem [2] with e = .5 and s = 10, 
/4<t 2 ^\ /" _^2/ P Chebyshev's inequality gives the sufficient condition on the 

~~ V s 3 J J number of samples n 





^ p>«. (75) »*ir%jffi < 77 > 



Var(?7 - t? 

n s_ 

where < k\ < 00 is a constant that depends on s, p, and m order to have 

For I 2 , the change of variables z = v x l p and v = t s - 2p P ( 777777 K - E|ry - r^|| < <5 ) > 1 - <5. (78) 

yield VCi(^l) 7 

To use dT7b , we need to evaluate Var(?7 — r^). To do this, 

/_ 2 1/P//4 2\ observe that Kr/ w Et^ w ^/s for 77 <E {ilttVf >^/ s } (these 

e approximations become equalities if we ignore overshoot). So 

« p we have 

00 

< e -, 2 n/(8a 2 ) / g—V/'/CsOjfo Var^ - r<) « E|t/ - r € | 2 

/, _ / (1 + o{l))C x {i, 2) 77 = r,f or 77 = 77,* 



e 



(s 2 u)p 



ex, "\ (l + o(l))(^/ S 3 ) ,,=*/ 

n/(8 CT 2 ) s -2p / e -* 1/p /(8<T 2 ) di (79) 

where the equality follows from Theorem [2] and (Q~3). Com- 
bining ( fTTt together with (|79l gives 

l£ — 77 for »7 = »7® or ? 7 = ^ 

"^ 77-77 for ^ = ^/« (80) 

s- 2p I e- tl/P /^dt 2 - P 

"To be precise, we sequentially generated (X1.Y1), (X2,Va): •-•> until 
both T£ and r) had stopped. So the generated samples (X, Y)'s are of variable 
length. 





00 





k 2 (76) 
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as a reasonable condition on n for ((78) to hold. In Fig. \T\ 
n = 10, 000 which guarantees roughly 8 — .05 for i] = r/f or 
i] = r\\ and 8 = .1 for r\ — £/s. 

Finally note that, for small values of £, the contribution due 
to overshoot cannot be neglected and Theorem [2] is loose. So 
in this regime the bounds (f80b must be taken with a grain of 
salt. 

Simulation - delayed observations: We proceeded similarly 
as in the previous section. We generated n samples X, 
computed the corresponding empirical sums s n with r] = r)*, 
and finally used Chebyshev's related inequality ( fTTT i with 
Vw{r}-n) = C 2 (d, s, 2) and d(£, 1) replaced by C 2 (d, s, 1) 
to obtain 



n > 



C 2 (d,s,2) 



77 

2P 



(81) 



as a reasonable condition on n to achieve 8 precision. In Fig. [2] 
n = 100, 000 which guarantees a precision of 8 — .03. 
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